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The Claims: 

1 . (Original) A method for managing HPC node failure comprising: 
determining that one of a plurality of HPC nodes has failed, each HPC node 

comprising an integrated fabric; and 

removing the failed node from a virtual list of HPC nodes, the virtual list comprising 
one logical entry for each of the plurality of HPC nodes. 

2. (Original) The method of Claim 1 , further comprising: 

determining that at least a portion of an HPC job was being executed on the failed 
node; and 

terminating the HPC job. 

3. (Original) The method of Claim 2, further comprising: 

determining that the HPC job was associated with a subset of the plurality of HPC 
nodes; and 

deallocating the subset of HPC nodes. 

4. (Original) The method of Claim 3, each entry of the virtual list 
comprising a node status and the method further comprising changing the status of each of 
the subset of HPC nodes to "available". 

5. (Original) The method of Claim 3, further comprising: 

determining dimensions of the terminated job based on one or more job parameters 
and an associated policy; 

dynamically allocating a second subset of the plurality of HPC nodes based on the 
determined dimensions; and 

executing the terminated job on the allocated second subset. 

6. (Original) The method of Claim 5, the second subset comprising a 
substantially similar set of nodes to the first subset. 



DALO 1:927630.1 



ATTORNEY DOCKET 
064747.1015 



3 



PATENT APPLICATION 
10/826,959 



7. (Original) The method of Claim 5, wherein dynamically allocating the 
second subset comprises: 

determining an optimum subset of nodes from a topology of unallocated HPC nodes; 



and 



allocating the optimum subset. 



8. (Original) The method of Claim 1, further comprising: 
locating a replacement HPC node for the failed HPC node; and 

updating the logical entry of the failed HPC node with information on the 
replacement HPC node. 

9. (Original) The method of Claim 1, wherein determining one of the 
plurality of HPC nodes has failed comprises determining that a repeating communication has 
not been received from the failed node. 

10. (Original) The method of Claim 1, wherein determining one of the 
plurality of HPC nodes has failed is accomplished through polling. 

1 1 . (Currently Amended) Software for managing HPC node failure , the software 
encoded in one or more computer-re adable media and when executed operable to: 

determine that one of a plurality of HPC nodes has failed, each node comprising an 
integrated fabric; and 

remove the failed node from a virtual list of HPC nodes, the virtual list comprising 
one logical entry for each of the plurality of HPC nodes. 

1 2. (Original) The software of Claim 1 1 , further operable to: 

determine that at least a portion of an HPC job was being executed on the failed node; 

and 

terminate the HPC job. 
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1 3 . (Original) The software of Claim 1 2, further operable to: 

determine that the HPC job was associated with a subset of the plurality of HPC 
nodes; and 

deallocate the subset of HPC nodes. 

14. (Original) The software of Claim 13, each entry of the virtual list 
comprising a node status and the software further operable to change the status of each of the 
subset of HPC nodes to "available". 

1 5 . (Original) The software of Claim 1 3 , further operable to : 

determine dimensions of the terminated job based on one or more job parameters and 
an associated policy; 

dynamically allocate a second subset of the plurality of HPC nodes based on the 
determined dimensions; and 

execute the terminated job on the allocated second subset. 

16. (Original) The software of Claim 15, the second subset comprising a 
substantially similar set of nodes to the first subset. 

17. (Original) The software of Claim 15, wherein the software operable to 
dynamically allocate the second subset comprises software operable to: 

determine an optimum subset of nodes from a topology of unallocated HPC nodes; 

and 

allocate the optimum subset. 

1 8. (Original) The software of Claim 1 1 , further operable to: 
locate a replacement HPC node for the failed HPC node; and 

update the logical entry of the failed HPC node with information on the replacement 
HPC node. 
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19. (Original) The software of Claim 11, wherein the software operable to 
determine one of the plurality of HPC nodes has failed comprises software operable to 
determine that a repeating communication has not been received from the failed node. 

20. (Original) The software of Claim 11, wherein the software operable to 
determine one of the plurality of HPC nodes has failed is accomplished through polling. 

2 1 . (Original) A system for managing HPC node failure comprising: 
a plurality of HPC nodes, each node including an integrated fabric; and 

a management node operable to: 

determine that one of the plurality of HPC nodes has failed, each node 
comprising an integrated fabric; and 

remove the failed node from a virtual list of HPC nodes, the virtual list 
comprising one logical entry for each of the plurality of HPC nodes. 

22. (Original) The system of Claim 21, the management node further 
operable to: 

determine that at least a portion of an HPC job was being executed on the failed node; 

and 

terminate the HPC job. 

23. (Original) The system of Claim 22, the management node further 
operable to: 

determine that the HPC job was associated with a subset of the plurality of HPC 
nodes; and 

deallocate the subset of HPC nodes. 

24. (Original) The system of Claim 23, each entry of the virtual list 
comprising a node status and the management node further operable to change the status of 
each of the subset of HPC nodes to "available". 
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25. (Original) The system of Claim 23, the management node further 
operable to: 

determine dimensions of the terminated job based on one or more job parameters and 
an associated policy; 

dynamically allocate a second subset of the plurality of HPC nodes based on the 
determined dimensions; and 

execute the terminated job on the allocated second subset. 

26. (Original) The system of Claim 25, the second subset comprising a 
substantially similar set of nodes to the first subset. 

27. (Original) The system of Claim 25, wherein the management node 
operable to dynamically allocate the second subset comprises the management node operable 
to: 

determine an optimum subset of nodes from a topology of unallocated HPC nodes; 

and 

allocate the optimum subset. 

28. (Original) The system of Claim 21, the management node further 
operable to: 

locate a replacement HPC node for the failed HPC node; and 

update the logical entry of the failed HPC node with information on the replacement 
HPC node. 



29. (Original) The system of Claim 21, wherein the management node 
operable to determine one of the plurality of HPC nodes has failed comprises the 
management node operable to determine that a repeating communication has not been 
received from the failed node. 



30. (Original) The system of Claim 21, wherein the management node 
operable to determine one of the plurality of HPC nodes has failed is accomplished through 
polling. 
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